Bayesian Sequential Designs
Jörn Alexander Quent
29 July 2020
What will I talk about?
- I want to show you a way to collect data that produces strong evidence but saves resources:
Bayesian Sequential Designs
- Basics: Frequentist vs. Bayesian statistics.
What will I talk about?
- I want to show you a way to collect data that produces strong evidence but saves resources:
Bayesian Sequential Designs
- Basics: Frequentist vs. Bayesian statistics.
- Much of what I say is based on this fantastic paper:

Let’s talk about the frequentist way
- Let’s consider a simple one-sample t-test:
\[t = \frac{\overline{x} - \mu}{s_\overline{x}}\] where \[ s_\overline{x} = \frac{s}{\sqrt{n}} \] where
\(\mu\) = Proposed constant for the population mean
\(\overline{x}\) = Sample mean
\(n\) = Sample size (i.e., number of observations)
\(s\) = Sample standard deviation
\(s_\overline{x}\) = Estimated standard error of the mean
Let’s talk about the frequentist way
Our toy example:
Let’s talk about the frequentist way

- The result is \(p(data | H_0) = 0.011\).
- In other words, how likely is our observation (data) under the null hypothesis.
- Note, a non-significant p-value doesn’t mean the \(H_0\) is true
- and a significant p-value also doesn’t mean that there is a meaningful effect.
A Bayesian way
- With Bayesian statistics, we can make completely different claims:
\[\underbrace{\frac{p(H_1 | data)}{p(H_0 | data)}}_\text{Posterior plausibility
about hypotheses} = \underbrace{\frac{p(H_1)}{p(H_0)}}_\text{Prior plausibility
about hypotheses} \times \underbrace{\frac{p(data| H_1)}{p(data| H_0)}}_\text{Bayes factor =
Predictive updating factor}\]
Evidence in favour of alternative:
\[ BF_{10} = \frac{p(data| H_1)}{p(data| H_0)}\]
Evidence in favour of null: \[ BF_{01} = \frac{p(data| H_0)}{p(data| H_1)}\]
This mean that you can just inverse them: \[ BF_{10} = 1/BF_{01}\]
Convention for Bayes Factors
| > 100 |
Extreme evidence for \(H_1\) |
| 30 – 100 |
Very strong evidence for \(H_1\) |
| 10 – 30 |
Strong evidence for \(H_1\) |
| 3 – 10 |
Moderate evidence for \(H_1\) |
| 1 – 3 |
Anecdotal evidence for \(H_1\) |
| 1 |
No evidence |
| 1 – 1/3 |
Anecdotal evidence for \(H_0\) |
| 1/3 – 1/10 |
Moderate evidence for \(H_0\) |
| 1/10 – 1/30 |
Strong evidence for \(H_0\) |
| 1/30 – 1/100 |
Very strong evidence for \(H_0\) |
| < 1/100 |
Extreme evidence for \(H_0\) |
|
|
- Most journals either require evidence of 6 or 10 in favour of the hypothesis.
- However, in practice evidence for \(H_0\) is often more difficult to obtain then for \(H_1\).
- For this talk, I am accepting \(BF_{10} > 10\) and \(BF_{10} < 1/6\) as evidence for \(H_1\) and \(H_0\).
A Bayesian way
- We can examine what probability of our hypothesis given the data is for our toy example.
## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 3.236673 ±0%
##
## Against denominator:
## Null, mu = 0
## ---
## Bayes factor type: BFoneSample, JZS
- Our \(H_1\) is only 3.2 more likely than \(H_0\), which only provides anecdotal evidence.
A Bayesian way
- Bayes factors are not ground truth and they depend on our priors and our alternative models.
- For instance, the Bayesian t-test (
ttestBF) assume the following priors:
\[s_\overline{x} = \frac{s}{\sqrt{n}} \;\;\;\;\;\;\;\; t = \frac{\overline{x} - \mu}{s_\overline{x}}\]

Rouder et al. (2009)
Traditional design example
This shows that a large sample size is necessary to a probability of 80% that \(BF_{10}\) > 10 or \(BF_{10}\) < 1/6.
| 0.5 |
72 |
0.0003 |
605 |
39600 |
| 0.0 |
232 |
0.0011 |
1949 |
127600 |
Sequential design
- Start with a minimum sample size (e.g. 10) and add data (e.g. 5 new points) until we reach the \(BF_{10}\) that we want.
Sequential design
- Start with a minimum sample size (e.g. 10) and add data (e.g. 5 new points) until we reach the \(BF_{10}\) that we want.
Sequential design
| 0.5 |
41 |
170 |
0.13 |
| 0.0 |
83 |
2765 |
2.95 |
Pure sequential design example
- The advantage is that this approach is guaranteed to provide strong evidence.
- Expected sample sizes are much lower than those of the traditional design.
Costs for traditional designs:
| 0.5 |
72 |
0.0003 |
605 |
39600 |
| 0.0 |
232 |
0.0011 |
1949 |
127600 |
Costs for pure sequential design:
| 0.5 |
41 |
170 |
0.13 |
344 |
22550 |
| 0.0 |
83 |
2765 |
2.95 |
697 |
45650 |
Setting an upper limit
- Practically, there is often is a limit.
- To prevent us from collecting thousands of participants for only one hypothesis, we can set an upper limit (i.e. maximal N).
- This limit can be based on:
- Money,
- Available participants or
- Time constraints.
Setting an upper limit
- We stop data collection at a sample size of 100.
| 0.5 |
39 |
98.05 |
0.12 |
1.83 |
| 0.0 |
58 |
80.25 |
2.31 |
17.44 |
Advantages of sequential designs
- Sequential designs are much more efficient even without a setting an upper limit.
- If planned generically (i.e. for no, small, medium or large effect), biased effect size estimates are no problem.
- There is high likelihood of obtaining strong evidence regardless of effect size.
- The interpretation of BF does not depend on stopping rules (unlike p-values; Schoenbrodt & Wagenmakers, 2018).
- Sequential designs should be pre-registered.
But
- Things get messy if you have more than one main hypothesis.
Concluding words
The CBU should use sequential designs more often.
Further reading and material
- Special issue in Psychonomic Bulletin & Review volume: Bayesian methods for advancing psychological science
- Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142. https://doi.org/10.3758/s13423-017-1230-y
- Etz, A., & Vandekerckhove, J. (2018). Introduction to Bayesian Inference for Psychology. Psychonomic Bulletin & Review, April 2017, 5–34. https://doi.org/10.3758/s13423-017-1262-3
- Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., Selker, R., Gronau, Q. F., Šmíra, M., Epskamp, S., Matzke, D., Rouder, J. N., & Morey, R. D. (2018). Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications. Psychonomic Bulletin & Review, 25(1), 35–57. https://doi.org/10.3758/s13423-017-1343-3
- Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237. https://doi.org/10.3758/PBR.16.2.225
All code can be found here. Further points and more simulations can be found here.